low dimensional random base
Improving Neural Network Training in Low Dimensional Random Bases
Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than their native parameter space. While such training is promising for more efficient and scalable optimization schemes, its practical application is limited by inferior optimization performance. Here, we improve on recent random subspace approaches as follows.
Review for NeurIPS paper: Improving Neural Network Training in Low Dimensional Random Bases
Weaknesses: What I found most worrying about this paper is that the FPD CIFAR-10 results does not seem to be consistent with the FPD paper [23]. In [23] the FPD appears to be able to achieve 90% of the original performance with 20 fold reduction of the parameters for the LeNet model (Table 1 in [23]), while Table 1 of this manuscript gets only 60% of performance with only 10 fold reduction of parameters. Similarly, [23] mentions that the ResNet appears to be more parameter efficient than the LeNet architecture, which indicates that FPD should generally work much better in this case. This makes me wonder if there is some underlying issue in the author's implementation? If so, it might be possible that if the FPD baseline is fixed, the observed improvement of the RBD method would not hold?
Review for NeurIPS paper: Improving Neural Network Training in Low Dimensional Random Bases
The reviewers were consistent in their appreciation of the paper, as the paper demonstrated clear improvements over the ICLR work [23], and the inconsistencies that originally worried some reviewers were clarified by the rebuttal. Drawing a new subspace every iteration appears to be novel for the neural net application, though the authors point out connections with ES (and the AC notes connections to DFO community literature as well, e.g., https://arxiv.org/abs/2003.02684 and https://arxiv.org/abs/1905.01332). The reviewers also liked the compartmentalization idea. To summarize, though the initial reviewers response was only mildly positive, after the rebuttal and our discussions, the reviewers think this paper empirically shows a significant improvement over prior work.
Improving Neural Network Training in Low Dimensional Random Bases
Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than their native parameter space. While such training is promising for more efficient and scalable optimization schemes, its practical application is limited by inferior optimization performance. Here, we improve on recent random subspace approaches as follows.